Identification of Chinese Personal Names in Unrestricted Texts
نویسندگان
چکیده
Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some IT applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name identification algorithm.
منابع مشابه
Identification and Classification of Proper Nouns in Chinese Texts
Various strategies are proposed to identify and classify three types of proper nouns in Chinese texts. Clues from character, sentence and paragraph levels are employed to resolve Chinese personal names. Character, Syllable and Frequency Conditions are presented to treat transliterated personal names, To deal with organization names, keywords, prefix, word association and parts-of-speech are app...
متن کاملEvaluating Chinese-English Translation Systems for Personal Name Coverage
This paper discusses the challenges which Chinese-English machine translation (MT) systems face in translating personal names. We show that the translation of names between Chinese and English is complicated by different factors, including orthographic, phonetic, geographic and social ones. Four existing systems were tested for their capability in translating personal names from Chinese to Engl...
متن کاملA Literary Anthroponomastics of Three Selected African Novels: A Cross Cultural Perspective
Names as markers of identity are a source of a wide variety of information. This paper explores the names of characters to show the sociocultural factors which influence the choice of names and the effects that the names of these characters have on the roles they play. Using a variety of personal names from Ayi Kwei Armah’s Fragments, Buchi Emecheta’s The Joys of Motherhood, a...
متن کاملLanguage modeling of Chinese personal names based on character units for continuous Chinese speech recognition
In this paper, we analyze Chinese personal names to model their statistical phonotactic characteristics for continuous Chinese speech recognition. The analysis showed languagespecific characteristics of Chinese personal names and strongly suggested the advantage of character-unit oriented modeling. A hierarchical language model was composed by reflecting statistical phonotactic characteristics ...
متن کاملKnowledge Extraction For Identification Of Chinese Organization Names
In this paper, a knowledge extraction process was proposed to extract the knowledge for identifying Chinese organization names. The knowledge extraction process utilizes the structure property, statistical property as well as partial linguistic knowledge of the organization names to extract new organizations from domain texts. The knowledge extraction processes were experimented on large amount...
متن کامل